Quantization method and apparatus

ABSTRACT

Quantization for oversampled signals with an error minimization searches based upon clusters of possible sampling vectors where the clusters have minimal correlation and thereby decrease reconstruction error as a function of oversampling (redundancy) ratio.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a divisional of application Ser. No. 12/184,873, filed Aug. 1, 2008 and claims the priority of U.S. provisional application Ser. No. 60/954,526, filed Aug. 7, 2007, the contents of which are herein incorporated by reference in their entirety.

BACKGROUND OF THE INVENTION

The present invention relates to digital signal processing, and more particularly to architectures and methods for oversampling quantization such as in analog-to-digital conversion.

Analog-to-digital converters (ADCs) are used to convert analog signals, such as music or voice, to digital format to allow for compression (e.g., MP3) and transmission over digital networks. An ADC samples an input analog signal (makes discrete in time) and quantizes the samples (makes discrete in amplitude) to give a sequence of digital words. For playback, a digital-to-analog converter (DAC) reconstructs (approximately) the input analog signal from the sequence of digital words. If the sampling rate of the ADC is higher than the Nyquist rate for the input analog signal, then the sampling is essentially reversible; whereas, the quantization always loses information. Thus ADC have a problem of optimal quantization.

A commonly used type of ADC includes oversampling plus a sigma-delta modulator; FIG. 3A illustrates one possible implementation of a one-stage modulator and quantizer. In particular, input signal x(t) is oversampled to give the input digital sequence x(nT) which will be quantized to give the output quantized digital sequence y(nT). The modulator operation includes subtracting a scaled and delayed quantization error, c e((n−1)T), from the input samples prior to quantization; this feedback provides integration of the overall quantization error. That is, y(nT)=Q(u(nT)) e(nT)=y(nT)−u(nT) u(nT)=x(nT)−c e((n−1)T) Recursive substitution gives: e(nT)=y(nT)−x(nT)+c[y((n−1)T)−x((n−1)T)]++ . . . +c ^(k) [y((n−k)T)−x((n−k)T)]+ See Boufounos and Oppenheim, Quantization Noise Shaping on Arbitrary Frame Expansion, Proc. ICASSP, vol. 4, pp. 205-208 (2005) which notes an optimal value for the scaling constant c as sin c(1/r) where r is the oversampling ratio.

SUMMARY OF THE INVENTION

The present invention provides quantization for oversampled signals with an error minimization search based on clusters of possible sampling vectors.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1B illustrate a preferred embodiment system and method.

FIGS. 2A-2B are experimental results.

FIGS. 3A-3C show an ADC, a processor, and network communication.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

1. Overview

Preferred embodiment oversampling quantization methods cluster possible quantization vectors according to correlations and search clusters to determine quantization to minimize reconstruction errors. FIG. 1A shows functional blocks of a system (e.g., an ADC) using the quantization method.

Preferred embodiment systems (pipelined delta-sigma analog-to-digital converters, etc.) perform preferred embodiment methods with any of several types of hardware: digital signal processors (DSPs), general purpose programmable processors, application specific circuits, or systems on a chip (SoC) such as combinations of a DSP and a RISC processor together with various specialized programmable accelerators. FIG. 3B is an example of digital camera hardware including various ADCs. A stored program in an onboard or external (flash EEP)ROM or FRAM could implement the signal processing. Preferred embodiment analog-to-digital converters and digital-to-analog converters can provide coupling to the real world, modulators and demodulators (plus antennas for air interfaces) can provide coupling for transmission waveforms, and packetizers can provide formats for transmission over networks such as the Internet; see FIG. 3C.

2. Sampling Theorem Background

Preliminarily, recall that the sampling theorem provides for exact reconstruction of x(t), a finite-energy and bandwidth-limited signal, from samples taken at the Nyquist rate or a higher rate. In particular, with x(t) in the subspace of L²(R) of functions with spectrum limited to −1/(2T)<f<1/(2T) where f is the frequency variable and T is a sampling interval (so the sampling rate is 1/T): x(t)=Σ_(nεz) x(nT)sin c(t/T−n) where sin c(u)=sin(πu)/πu and Z denotes the set of integers.

The samples can be written in terms of an inner product in L²(R) as follows. Let X(f) be the Fourier transform of x(t); then using the limit on the bandwidth of x(t) gives

$\begin{matrix} {{x\left( {n\; T} \right)} = {\int_{{- \infty},\infty}^{\;}{{X(f)}{\mathbb{e}}^{2\pi\;{\mathbb{i}}\;{fnT}}\ {\mathbb{d}f}}}} \\ {= {\int_{{- \infty},\infty}^{\;}{{X(f)}{{rect}({fT})}{\mathbb{e}}^{2\pi\;{\mathbb{i}}\;{fnT}}\ {\mathbb{d}f}}}} \end{matrix}$ where rect(u) denotes the rectangle function which equals 1 for −½<u<½ and 0 elsewhere. Then applying Parseval's theorem gives: x(nT)=∫_(−∞, ∞) x(t)(1/T)sin c(t/T−n)dt where we used the Fourier transform of (1/T)sin c(t/T−n) is rect(fT)e^(−2πi f nT). Then the integral can be interpreted as the inner product in L²(R): x(nT)=

x(t)|(1/T)sin c(t/T−n)

Note that the functions sin c(t/T−n) are orthogonal:

$\begin{matrix} {\left\langle {{\sin\;{c\left( {{t/T} - m} \right)}}❘{\sin\;{c\left( {{t/T} - n} \right)}}} \right\rangle = {T^{2}{\int_{{- \infty},\infty}^{\;}{{{rect}({fT})}{\mathbb{e}}^{{- 2}\pi\;{\mathbb{i}}\;{fmT}}}}}} \\ {{rect}({fT}){\mathbb{e}}^{2\pi\;{\mathbb{i}}\;{fnT}}\ {\mathbb{d}f}} \\ {= {T^{2}{\int_{{- \infty},\infty}^{\;}{{{rect}({fT})}{\mathbb{e}}^{{- 2}\pi\;{\mathbb{i}}\;{f{({n - m})}}T}{\mathbb{d}f}}}}} \\ {= {T\;\sin\;{c\left( {n - m} \right)}}} \\ {= {T\;\delta_{m,n}}} \end{matrix}$ Thus the sampling theorem states that the set of functions sin c(t/T n) for n=0, ±1, ±2, . . . constitutes an orthogonal basis for H, the subspace of L²(R) consisting of functions with spectrum in 1/(2T)<f<1/(2T). That is, the sampling theorem for H provides: x(t)=Σ_(nεZ)

x(.)|φ_(analysis, n)(.)

φ_(syntheses, n)(t) where the synthesis functions are φ_(synthesis, n)(t)=sin c(t/T−n) and the analysis functions are φ_(analysis, n)(t)=(1/T)sin c(t/T−n) for the space of bandlimited signals.

This may also be interpreted as the analysis linear mapping F: H→l²(Z) with F(x(t))=x where the components of vector x, x_(n)=

x(t)|φ_(analysis, n)(t)

, are the samples of x(t), together with the adjoint synthesis linear mapping F*: l²(Z)→H defined by F*(c)=Σ_(nεZ) c_(n) φ_(synthesis, n)(t). Of course, F is one-to-one but not onto, F* is not one-to-one, and the operator norm of F depends upon the sampling rate: ∥F(x(t))∥²=(1/T)∥x(t)∥².

3. Oversampling

For sampling x(t) in H at rates higher than the Nyquist rate, such as oversampling by a (redundancy) factor of r>1, the corresponding samples, x(nT/r), could still be expressed as inner products of x(t) with the corresponding analysis functions (1/T)sin c(t/T−n/r), and x(t) could still be expanded as a sum of synthesis functions (1/r)sin c(t/T−n/r) where the 1/r factor comes from the fact that each of r subsets of the functions sin c(t/T−n/r) with the same n mod r suffices (by the sampling theorem) to reconstruction x(t) and so the total sum gives r times x(t). However, these sets of functions are overcomplete in the sense that the expansion coefficients are not unique for x(t) in H, and the functions no longer form orthogonal sets:

sin c(t/T−m/r)sin c(t/T−n/r)

=T sin c(n/r−m/r)

As a simple example, for r=2 and T=1 the analysis functions are the sin c(t−n/2). Then the samples of x(t) at integers and half-integers are: x(m)=

x(t)−sin c(t−m)

x(m+½)=

x(t)sin c(t−(m+½))

The sampling theorem provides that each of these sets of samples is sufficient for reconstruction:

$\begin{matrix} {{x(t)} = {\sum\limits_{m \in Z}^{\;}\;{{x(m)}\sin\;{c\left( {t - m} \right)}}}} \\ {= {\sum\limits_{m \in Z}^{\;}\;{2{x(m)}\left( {1/2} \right)\sin\;{c\left( {t - m} \right)}}}} \end{matrix}$ $\begin{matrix} {{x(t)} = {\sum\limits_{m \in Z}^{\;}\;{{x\left( {m + {1/2}} \right)}\sin\;{c\left( {t - \left( {m + {1/2}} \right)} \right)}}}} \\ {= {\sum\limits_{m \in Z}^{\;}\;{2{x\left( {m + {1/2}} \right)}\left( {1/2} \right)\sin\;{c\left( {t - \left( {m + {1/2}} \right)} \right)}}}} \end{matrix}$ Thus the expansion in terms of the overcomplete set is not unique; indeed x(t)=Σ_(n εZ) c_(n)(½)sin c(t−n/2) where a first possible set of expansion coefficients is: c_(n)=x(n/2) (which is the vector F(x(t))), a second possible set is c_(n)=2x(n/2) for n even and c_(n)=0 for n odd; and a third possible set is: c_(n)=0 for n even and c_(n)=2x(n/2) for n odd. In fact, the set of expansion coefficients c_(n)=x(n/2) for n even and c_(n)=x(n/2) for n odd makes the sum 0; that is, Σ_(n εZ)(−1)^(n) x(n/2)(½)sin c(t−n/2)=0. 4. Frames

In general, let H denote L²(R) or a subspace of L²(R); then a set {φ_(j)}_(jεJ) is a frame of H if there exist two positive numbers A₁, A₂ such that for any x in H we have A ₁ ∥x∥ ²≦Σ_(jεJ) |

x|φ _(j)

|² ≦a ₂ ∥x∥ ²

A frame is called “tight” when A₁=A₂. The examples of the preceding sections, φ_(j)=sin c(t/T−j/r), are tight frames with A=rT . Frames can be viewed as a generalization of bases of a Hilbert space. If {φ_(j)}_(jεJ) is a frame, then any element x in H can be expressed as x=Σ_(jεj) c_(j) φ_(j) where the expansion coefficients c={c_(j)}_(jεJ) may be considered an element of l²(Z) but may not be unique if the frame is overcomplete, such as for the example of preceding section 3.

The frame analysis operator, F: H→l²(Z), and its adjoint (frame synthesis) operator, F*: l²(Z)→H, are defined by: F(x)={

x|φ _(j)

}_(jεJ) F*(c)=Σ_(jεJ) c _(j) φ_(j) Note that when h={h_(j)}_(jεJ) is in ker(F*), the kernel (null space) of F*, then c+h also is a set of expansion coefficients for x: x=Σ _(jεJ)(c _(j) +h _(j))φ_(j) Of course, h being in ker(F*) is the same as h being perpendicular to ran(F), the range of F, in l²(Z). Note that ran(F) is the subspace of vectors which consist of samples of functions in H.

Define the frame operator, F*F: H→H, so F*F(x)=Σ_(jεJ)

x|φ_(j)

φ_(j). Then the dual frame, {φ_(dual j)}_(jεJ), is defined as the frame operator inverse image of the frame: φ_(dual.j)=(F*F)⁻¹ φ_(j) For tight frames we have φ_(dual.j)=φ_(j)/A where A is the frame bound.

One possible set of expansion coefficients of particular importance is the minimum-norm-in-l²(Z) solution, which is computed as: ĉ _(j) =

x|φ _(dual)

Any other set of expansion coefficients for x can be expressed as c=ĉ+Σ _(n) γ_(n) v ^((n)) where the v^((n)) form an orthonormal basis for ker(F*) as a subspace of l²(Z). 5. Quantization for Reconstruction

Quantization of samples for an oversampling ADC corresponds to quantization of expansion coefficients of the signal x(t) with respect to the frame corresponding to the oversampling (e.g., sin c(t/T−n/r)); and the preferred embodiment methods minimize the reconstruction error due to the quantization of the frame expansion coefficients. First, define the quantization error for vector c in l²(Z) as the vector Q_(error)(c) in l²(Z) Q _(error)(c)=c+Σ _(n) γ_(n) v^((n))−(└c┘+α) where └.┘ denotes the floor function (largest integer not greater than) and α is an integer vector that corresponds to the quantization approximation. For example, when Q(c_(j))=└c_(j)┘, then α_(j)=0; whereas, when Q(c_(j))=┌c_(j)┐, then α_(j)=1. ┌.┐ denotes the ceiling function (smallest integer not less than), and the c_(j) are presumed normalized by the quantization step size. In effect, Q(c_(j))=└c_(j)┘+α_(j), and Q_(error)(c)=c−Q(c)+Σ_(n) γ_(n) v^((n)) where, as before, the v^((n)) form an orthonormal basis for ker(F*). Let e=c−└c┘ so e is the integer truncation error vector. Then the reconstruction error in H is:

$\begin{matrix} {e^{({rec})} = {F*\left( {Q_{error}(c)} \right)}} \\ {= {F*\left( {e - \alpha} \right)}} \\ {= {\sum\limits_{j \in J}^{\mspace{14mu}}\;{\left( {e_{j} - \alpha_{j}} \right)\varphi_{j}}}} \end{matrix}$ Therefore the reconstruction optimization problem can be written as minimizing the objective function

$\begin{matrix} {\Psi = {e^{({rec})}}^{2}} \\ {= \left\langle {{F*\left( {e - \alpha} \right)}❘{F*\left( {e - \alpha} \right)}} \right\rangle} \\ {= \left\langle {{L^{({rec})}\left( {e - \alpha} \right)}❘{e - \alpha}} \right\rangle} \end{matrix}$ where L^((rec))=FF*. The following sections 9-11 provide approximate solutions of the optimization problem. 6. Minimizing Error After Linear Operation

If the frame coefficients undergo a known linear operation, e.g., linear filtering, prior to frame synthesis, then the above formulation needs to be modified to accommodate this linear operation. Assume that a linear operator S: l²(Z)→l²(Z) is applied to the original frame coefficients c. The reconstructed signal has the general form: x=F*S(c) Therefore the earlier formulation of the optimization problem still applies and we get the optimization model:

$\begin{matrix} {\Psi = \left\langle {{F*{S\left( {e - \alpha} \right)}}❘{F*{S\left( {e - \alpha} \right)}}} \right\rangle} \\ {= \left\langle {{L^{({linear})}\left( {e - \alpha} \right)}❘{e - \alpha}} \right\rangle} \end{matrix}$ where L ^((linear)) =S*FF*S

Where S* is the adjoint operator of S. Note that with the above formulation, we minimize the reconstruction error after the linear operator even though the quantization takes place before the linear operator.

7. Minimizing Quantization Error

The objective function for minimization of the quantization error in l²(Z) is:

$\begin{matrix} {\Psi = {{Q_{error}(c)}}^{2}} \\ {= {{\gamma + \left( {e - \alpha} \right)}}^{2}} \end{matrix}$ where γ=Σ_(n) γ_(n) v^((n)) is in ker(F*). For any value of a the value of γ that yields the minimization is equivalent to the projection of e−α onto ker(F*) in l²(Z); i.e., γ_(j) =−

e−α|v ^((j))

Define the projection operation P: l²(Z) ker(F*) as P(c)=Σ_(j)

c|v ^((j))

v ^((j)) Then the objective function can be rewritten as: Ψ∥(I−P)(e−α)∥² Note that I−P is the orthogonal projection operator onto ran(F); hence, it is a self-adjoint operator. Therefore, the optimization problem has the same form as the previous but with

$\begin{matrix} {L^{({quant})} = \left( {I - P} \right)^{2}} \\ {= \left( {I - P} \right)} \end{matrix}$ Note that: ∥e ^((rec)) ∥=∥F*Q _(error)(c)∥≦∥F*∥∥Q _(error)(c)∥ where ∥F*∥ denotes the usual operator norm sup_(∥c∥=1)∥F*(c)∥. Thus there is a direct proportion between the minimization of the quantization error and the minimization of the reconstruction error. 8. Optimization

The optimizations of preceding sections 5-7 all have the same form with an operator L which is a positive, self-adjoint operator in l²(Z). L is also a compact operator because it is the limit of a sequence of finite-rank approximation operators due to the frame definition upper bound; e.g., for L^((rec)) we have L ^((rec))(c)_(k) =

Σ _(jεJ) c _(j) φ_(j)|φ

Next, define the finite rank approximation L_(n) as

$\begin{matrix} {{L_{n}(c)}_{k} = {{\left\langle {{\sum\limits_{j \leq n}^{\;}\;{c_{j}\varphi_{j}}}❘\varphi_{k}} \right\rangle\mspace{14mu}{if}\mspace{14mu} k} \leq n}} \\ {= {{0\mspace{14mu}{if}\mspace{14mu} k} > n}} \end{matrix}$ Then $\begin{matrix} {{\left( {L^{({rec})} - L_{n}} \right)(c)_{k}} = {{\left\langle {{\sum\limits_{j > n}^{\;}\;{c_{j}\varphi_{j}}}❘\varphi_{k}} \right\rangle\mspace{14mu}{if}\mspace{14mu} k} \leq n}} \\ {= {{{L^{({rec})}(c)}_{k}\mspace{14mu}{if}\mspace{14mu} k} > n}} \end{matrix}$ The operator norm is: ∥(L ^((rec)) −L _(n))∥=sup_(∥c|=1)∥(L ^((rec)) −L _(n))(c)∥ So squaring and using the frame definition:

$\begin{matrix} {{\left( {L^{({rec})} - L_{n}} \right)}^{2} = {\sup_{{c} = 1}{{\left( {L^{({rec})} - L_{n}} \right)(c)}}^{2}}} \\ {= {\sup_{{c} = 1}{\sum\limits_{i,{j > n}}^{\;}\;{{c_{j}\left\langle {\varphi_{i}❘\varphi_{j}} \right\rangle}}^{2}}}} \\ {< {A_{2}\max\mspace{14mu}\left( {\varphi_{j}}^{2} \right){\sum\limits_{i > n}^{\;}\;{c_{i}}^{2}}}} \end{matrix}$ And $\left. {\sum\limits_{i > n}^{\;}\;{c_{i}}^{2}}\rightarrow 0. \right.$

The unknown variable (i.e., quantization method) is the integer vector α. The objective function for reconstruction error minimization can be put into an alternate form (after removing the irrelevant term

Le|e

): Ψ^((rec)) =

L ^((rec))α|α

—2 Re{

L ^((rec)) α|e

} Note that the reconstruction mean squared error (MSE) of the rounding solution does not improve with increased redundancy after a linear operator. This is primarily because a general linear operator moves components of the kernel of the quantization error into the domain of the synthesis operator. This situation is avoided in the formulation of the optimization problem with the operator L^((linear)) to get consistent improvement.

Since each of the operators L is a compact, positive, self-adjoint operator, it has positive real eigenvalues and the eigenvectors that correspond to distinct eigenvalues are orthogonal. Moreover, the space spanned by the eigenvectors represents the domain of L, (i.e., the orthogonal space is ker(L)). Let {w_(j)} be an orthonormal basis of the eigenvectors of L so that L w_(j)=λ_(j) w_(j) where λ_(j) is real and positive. Let η_(j)=

α|w_(j)

and ζ_(j)=

e|w_(j)

; then the objective function becomes:

$\begin{matrix} {\Psi^{({rec})} = {\left\langle {{\sum\limits_{j}^{\;}\;{\lambda_{j}\eta_{j}w_{j}}}❘\alpha} \right\rangle - {2\;{Re}\left\{ \left\langle {{\sum\limits_{j}^{\;}\;{\lambda_{j}\eta_{j}w_{j}}}❘e} \right\rangle \right\}}}} \\ {= {\sum\limits_{j}^{\;}\;{\lambda_{j}\left( {{\eta_{j}}^{2} - {2\;{Re}\left\{ {\eta_{j}\zeta_{j}^{*}} \right\}}} \right)}}} \end{matrix}$

In the finite dimensional case a positive self-adjoint operator can be represented by a positive semidefinite matrix. The optimization problem in this case is reduced to a standard constrained quadratic integer programming problem (with the constraint that the last element of any feasible solution is one). For infinite dimensional frames we still have the same model but with infinite number of variables. However, we add the constraint that any feasible solution should be in l²(Z). Therefore we assume that there exists a positive integer N such that α_(j)=0 for j>N.

The optimal solution of the above optimization problem can be computed directly using enumeration for small frame size. For moderate sizes, (e.g., less than 100), the problem can be transformed into a semidefinite programming problem to find a suboptimal solution. However, as size grows it can only be solved using numerical search techniques, e.g., Tabu search. In the following section we describe an efficient heuristic that exploits the optimization model for the overcomplete frames to significantly prune the search space.

9. Approximation

The search procedure described in this section is a reformulation of the objective function for reconstruction error. In particular, we describe a search technique that minimizes the correlation between successive incremental vectors by exploiting the frame redundancy. Since the operator L is a positive self-adjoint operator, it can be written as L=B² where B is a bounded self-adjoint operator. Let {u_(j)} be the elementary orthonormal basis of l²(Z), i.e., δ_(i, j). Also, define: e _(R) =B(e) b _(j) =B(u _(j)) Then using L=B²=B*B, the objective function can be written as:

$\begin{matrix} {\Psi^{({rec})} = {e^{({rec})}}^{2}} \\ {= \left\langle {{L\left( {e - \alpha} \right)}❘{e - \alpha}} \right\rangle} \\ {= \left\langle {{B\left( {e - \alpha} \right)}❘{B\left( {e - \alpha} \right)}} \right\rangle} \\ {= {{{B(e)} - {B(\alpha)}}}^{2}} \\ {= {{e_{R} - {\sum\limits_{j \in J}^{\;}\;{\alpha_{j}b_{j}}}}}^{2}} \end{matrix}$ For overcomplete frames, the set of vectors {b_(j)} are linearly dependent. This redundancy can be exploited to prune the search space of the optimization problem. This is a clustering problem that is discussed in details in the following sections. 10. Clustering Procedure

The objective of the clustering procedure is to cluster the possibly infinite set {b_(j)} from the preceding section to at most M clusters such that the mutual correlations between clusters are minimized. Moreover, each cluster is to have two representatives: one is a vector in l²(Z) that is used in testing the search increments and the second is a binary vector for elements of a.

In particular, given a set of vectors {b_(j)}, the preferred embodiment methods generate M clusters with minimal mutual correlation between clusters. Note that in practice we have the original unquantized sequence c in l²(Z); hence c_(j)→0; i.e., for each ε>0, there exists N(ε) such that |c_(j)|<ε for j>N(ε), which is also dependent upon c. If we choose c to be much smaller than a single quantization step, we may assume that α_(j)=0 for j>N(ε). Therefore, we have N vectors and are required to generate M clusters. The solution space of the optimization problem has 2^(N) possible binary solutions a. Each cluster corresponds to a search increment in the quantization phase. The preferred embodiment clustering procedure is to minimize the correlations between successive search increments by reducing the overall correlation between clusters. The clustering procedure proceeds as follows:

1) Generate clusters in multiples of S≦N; i.e., we generate S clusters at each iteration.

2) The initial centers of the S clusters at each iteration are an orthonormal basis for l²(Z) denoted {g_(j)}.

3) The first vector b_(j) that is classified to the k-th cluster is the vector that maximizes the normalized correlation |

g_(k)|b_(j)

|/∥b_(j)∥. The cluster vector representative becomes b_(j) and the binary representative has 1 only in the j-th location and 0 elsewhere. Denote the cluster representative by h_(k); i.e., initially h_(k)=b_(j).

4) For each other vector b_(i), include it in the k-th cluster when |

g _(k) |b _(i) ±h _(k)

|/∥b _(i) ±h _(k) ∥<|

g _(k) |h _(k)

|/∥h _(k)∥. In this case the cluster representative is updated to b_(i)±h_(k) and the i-th location in the binary representative vector is updated to 1 with the proper sign.

5) The set of S initial orthonormal vectors {g_(j)} can be generated using Gram-Schmidt procedure on {b_(j)} with different vector order, or by rotating the eigenvectors of the linear operator by a fixed angle using a rotation operator.

Note that the complexity of the clustering procedure is not a critical issue sin ce it is performed offline.

In the finite dimensional case, e.g., R^(N), the linear operator is approximated by a positive semidefinite matrix Γ, which in general can be expressed as Γ=B², where B is a hermitian matrix that can be directly computed using a unitary transformation of the Cholesky decomposition of Γ. In this case the vectors {b_(j)} are simply the columns of B. The number of nonzero vectors equals rank (B) and this equals the maximum number of clusters within a group.

11. Quantization Procedure

At the end of clustering procedure, we have M clusters each with one representative vector {h_(j)} j=1, 2, . . . , M and one binary vector for the increment {μ_(j)} j=1, 2, . . . , M. The quantization search proceeds as follows:

-   1) compute e_(R) as B(e), and set α=0. -   2) for j=1,2, . . . M, if ∥e_(R)±h_(j)∥<∥e_(R)∥, then replace e_(R)     with e_(R)±h_(j), and update binary vector α to α±μ_(j). -   3) Step 2 is repeated as needed or until convergence.     Note that the comparison in step 2) can be simplified to:     |     e _(R) |h     |<(½)∥h _(j)∥²     The total computation overhead for the preferred embodiment is as     follows: -   1) Linear operator computation for computing e_(R) as B(e). -   2) M inner product operations for the search |     e_(R)|h_(j)     |<(½)∥h_(j)∥².

In summary, the preferred embodiment methods include the steps of:

-   -   (a) given a frame {φ_(n)}, compute (offline) the b_(j)=B(u);     -   (b) cluster (offline) the b_(j) into M clusters with cluster         vectors h_(k) and increment vectors μ_(k).     -   (c) compute frame expansion coefficients c_(j) for an input         signal (i.e., oversampling); the input signal typically would be         partitioned into overlapping blocks so the computation is finite         and limited delay;     -   (d) quantize the c_(j) by the following steps (i) to (iii):         -   (i) compute e_(j)=c_(j)−└c_(j)┘ and e_(R)=B(e);         -   (ii) for j=1, 2, . . . , M: when ∥e_(R)±h_(j)∥<∥e_(R)∥,             update e_(R) to e_(R)±h_(j) and α to α≠μ_(j); (α was             initialized as the 0 vector); this cycling through the             clusters is repeated until convergence (i.e., no more             updates).         -   (iii) quantize c_(j) as └c_(j)┘+α_(j).             FIG. 1B heuristically illustrates the preferred embodiments:             the input analog signal x and two reconstructed analog             signals x^((rec1)) and x^((rec2)) in H are shown in the             lower portion, and the oversampled F(x) plus two possible             quantizations in l²(Z) are shown in the upper portion. The             preferred embodiments find the quantization to minimize             reconstruction error.             12. Additional

Section 11 described a search technique that is based on minimizing the correlation between successive moves. Unlike the problem of classical integer programming problem where we deal with binary vectors that do not reveal much information, the preferred embodiment uses a clustering technique to work with meaningful vectors to the optimization problem that are the representative for each cluster.

In the case where the delay is a concern (especially in the infinite dimensional case) the clusters can be designed such that the representative binary vector of each cluster is zero outside the allowable delay. Also, the error in a group of S clusters can be reduced by correcting for the additional errors in the following groups of clusters in a procedure that could be considered a generalization of the projection procedure onto the next frame sample that was described in Boufounos and Oppenheim, cited in the background.

As an infinite-dimensional example, consider I^((rec))=FF* of section 5 with T=1 for notational simplicity; then φ_(n)(t)=(1/r)sin c(t−n/r) is the synthesis frame and φ_(dual, n)(t)=sin c(t−n/r) is the analysis frame. Compute B as follows. First, L^((rec))(u_(j))_(k)=FF*(u_(j))_(k)=F(φ_(j)(t))_(k)=

φ_(dual, k)(t)|φ_(j)(t)

=(1/r)sin c(k/r−j/r). Next, presume B is translation invariant like L: B(c)_(k)=Σ_(nεZ) β_(k-n) c _(n) Then, L^((rec))(u_(j))_(k)=BB(u_(j))_(k) implies (1/r)sin c(k/r−j/r)=Σ_(nεZ) β_(k-n) β_(n-j) or with a variable change: (1/r)sin c(m/r)=Σ_(nεZ) β_(m-n) β_(n) Next, Fourier transform l²(Z)→L²(−½, ½) to convert the convolution into a multiplication. Then note the Fourier transform of (1/r)sin c(n/r) is rect(rθ) to have β_(n)=(1/r)sin c(n/r) So the vectors for the cluster computations are:

$\begin{matrix} {{b_{j}(k)} = {B\left( u_{j} \right)}_{k}} \\ {= {\left( {1/r} \right){{sinc}\left( {{k/r} - {j/r}} \right)}}} \end{matrix}$ Then

b_(m)|b_(n)

=Σ_(kεz)(1/r)sin c(m/r−k/r)(1/r)sin c(n/r−k/r)=(1/r)sin c(m/r−n/r), so ∥b_(j)∥²=1/r. Intuitively, the oversampling scales down the vector components by a factor of (1/r) to compensate for the redundancy factor r in the expansion; and heuristically the (infinite) dimension of ran(F) is 1/r of the total dimension, so for a random vector the projection onto ran(F) decreases the norm by a factor of 1/√r.

The orthonormal basis {g_(j)} could then be taken as g_(j)=(√r)b_(jr). Then the first vector in the k-th cluster would be b_(kr) and thus be the initial h_(k).

For the k-th cluster, vector b_(i) is included when: |

g _(k) |b _(i) ±h _(k)

|/∥b _(i) ±h _(k) ∥>|

g _(k) |h _(k)

|/∥h _(k)∥. Squaring both sides, cross multiplying, and dividing out the square root of r normalization converts the condition into: |

b _(kr) |b _(i) ±b _(kr)

|² ∥b _(kr)∥² >|

b _(kr) |b _(kr)

|² ∥b _(i) ±b _(kr)∥². Divide out ∥b_(kr)∥²=|

b_(kr) |b _(kr)

² The left side then equals:

$\begin{matrix} {{\left\langle {b_{kr}❘{b_{i} \pm b_{kr}}} \right\rangle }^{2} = {{\left\langle {b_{kr}❘b_{i}} \right\rangle \pm \left\langle {b_{kr}❘b_{kr}} \right\rangle}}^{2}} \\ {= {\left( {1/r^{2}} \right){{{{sinc}\left( {{{kr}/r} - {i/r}} \right)} \pm 1}}^{2}}} \\ {= {\left( {1/r^{2}} \right)\left\lceil {{{{sinc}\left( {{{kr}/r} - {i/r}} \right)}^{2} \pm {2\;{{sinc}\left( {{{kr}/r} - {i/r}} \right)}}} + 1} \right\rceil}} \end{matrix}$ The right side equals

$\begin{matrix} {{{b_{i} \pm b_{kr}}}^{2} = {{{b_{i}}^{2} \pm {2\left\langle {b_{i}❘b_{kr}} \right\rangle}} + {b_{kr}}^{2}}} \\ {= {\left( {1/r^{2}} \right)\left\lbrack {{1 \pm {2\;{{sinc}\left( {{{kr}/r} - {i/r}} \right)}}} + 1} \right\rbrack}} \end{matrix}$ The left side is always smaller because sin c(kr/r−i/r)²<1; thus the clusters have a single representative, and the quantization search is simply over the vectors corresponding to samples of lowpass-filtered pulses at the Nyquist rate. 13. Experimental Results

In our simulation we use a finite dimensional Hilbert space generated by the Fourier series coefficients of periodic band-limited functions as in Goyal et al., Quantized Overcomplete Expansions in R^(N): Analysis, Synthesis, and Algorithms, 44 IEEE Tran. Info Theory 16 (January 1998). In particular, a real-valued function x(t) with period T, integrable over [0, T], and with maximum frequency W can be expanded in a finite Fourier series: x(t)=X(1)+Σ_(1≦k≦W) X(2k)√2 cos [2πkt/T]+X(2k+1)√2 sin [2πkt/T] Thus take the Hilbert space H as R^(N) where N=2W+1 and the function x(t) is represented by the vector

$\begin{bmatrix} {X(1)} \\ {X(2)} \\ {X(3)} \\ \ldots \\ {X(N)} \end{bmatrix}$ in H. Now samples of x(t) at M equi-spaced points in [0, T] can be written as: x(mT/M)=X(1)+Σ_(1≦k≦W) X(2k)√2 cos [2πkm/M]+X(2k+1)√2 sin [2πkm/M] where m=0, 1, 2, . . . , M−1 (or equivalently by periodicity, m=1, 2, . . . , M). Denote these samples as a vector c in R^(M) (which replaces l²(Z) in this finite-dimensional case) with components c_(m)=x(mT/M). Oversampling corresponds to M>N. And the samples can be written as inner products in H:

c_(m) = ⟨X❘φ_(m)⟩ where $\varphi_{m} = \begin{bmatrix} 1 \\ {\sqrt{2}{\cos\left\lbrack {2\;\pi\;{m/M}} \right\rbrack}} \\ {\sqrt{2}{\sin\left\lbrack {2\;\pi\;{m/M}} \right\rbrack}} \\ \ldots \\ {\sqrt{2}{\sin\left\lbrack {2\;\pi\;{{Wm}/M}} \right\rbrack}} \end{bmatrix}$ The {φ_(m)} form the frame in H, and then the frame analysis operator F: H→R^(M) maps X→c where c_(m)=

X|φ_(m)

. Thus F corresponds to the M×N matrix with elements F_(m, 1)=1, F_(m, 2k)=√2 cos [2πkm/M], and F_(m, 2k+1)=√2 sin [2πkm/M]; that is, the m-th row of F is φ_(m). The frame synthesis operator F* : R^(M)→H maps c→Y where Y(1)=Σ_(1≦m≦M) c _(m), Y(2k)=Σ_(1≦m≦M) c _(m) √2 cos [2πkm/M], Y(2k+1)=Σ_(1≦m≦M) c _(m) √2 sin [2πkm/M], F* corresponds to the N×M matrix which is the adjoint of the F matrix so the columns of F* are the {φ_(m)}, and F*F=M I_(33 N). Further, the optimization operator L=FF*: R^(M)→R^(M) corresponds to the M×M matrix with elements

φ_(n)|φ_(m)

. Note that

⟨φ_(n)❘φ_(m)⟩ = 1 + 2(cos [2 π m/M]cos [2 π n/M] + sin [2 π m/M]sin [2 π n/M]) + 2(cos [2 π 2 m/M]cos [2 π 2 n/M] + sin [2 π 2 m/M]sin [2 π 2 n/M]) + … + 2(cos [2 π km/M]cos [2 π kn/M] + sin [2 π km/M]sin [2 π kn/M]) + … + 2(cos [2 π Wm/M]cos [2 π Wn/M] + sin [2 π Wm/M]sin [2 π Wn/M]) Then using the cosine addition formula:

$\begin{matrix} {\left\langle {\varphi_{n}❘\varphi_{m}} \right\rangle = {1 + {2\;{\cos\left\lbrack {2\;{{\pi\left( {m - n} \right)}/M}} \right\rbrack}} +}} \\ {{2\;{\cos\left\lbrack {2\;\pi\; 2{\left( {m - n} \right)/M}} \right\rbrack}} + \ldots +} \\ {{2\;{\cos\left\lbrack {2\;\pi\;{{k\left( {m - n} \right)}/M}} \right\rbrack}} + \ldots +} \\ {2\;{\cos\left\lbrack {2\;\pi\;{{W\left( {m - n} \right)}/M}} \right\rbrack}} \\ {{= {{\sin\left\lbrack {2{\pi\left( {W + {1\text{/}2}} \right)}{\left( {m - n} \right)/M}} \right\rbrack}/{\sin\left\lbrack {2\;{\pi\left( {1\text{/}2} \right)}{\left( {m - n} \right)/M}} \right\rbrack}}}\;} \\ {= {{\sin\left\lbrack {\pi\;{{N\left( {m - n} \right)}/M}} \right\rbrack}/{\sin\left\lbrack {{\pi\left( {m - n} \right)}/M} \right\rbrack}}} \end{matrix}$

The test vectors in R^(N) are Gaussian independent, identically-distributed sequences with unity variance. The frame functions {φ_(m)} in the synthesis process are not quantized. In the first experiment, we establish the effectiveness of the preferred embodiment search method as described in section 11. We fix N=4, and increase the redundancy factor r. We set the number of reduced search vectors in R^(N) to M=N×r (compared to the total search space of 2^(M)). The M search vectors are generated as described in section 10. In a single iteration we have M search steps as described in section 11. To evaluate the preferred embodiment method, we simulate the results of Boufounos and Oppenheim (cited in the background) for frame quantization by projecting the quantization error onto the following frame coefficient. The results are shown in FIG. 2A.

The preferred embodiment search method gives significant improvement in the reconstruction mean square error (MSE) over the rounding solution. Note that the rounding solution has an error behavior almost O(1/r) which is proportional to the dimension of ker(F*). The preferred embodiment search method with the described parameters performs better than the projection algorithm in Boufounos and Oppenheim which simulates the sigma-delta quantization for redundant frames, therefore, the SNR is O(1/r³) as it resembles the error behavior of a single stage sigma-delta converter. On the other hand, the error behavior of the preferred embodiment is approximately O(1/r⁴) which is a significant improvement. This performance is typical for any frame order. One other important point from the figure is that most of the improvement occurs in the first iteration. This is due to the efficient procedure for constructing the search movements with orthogonal sets. Note that, the described order of iterations of the preferred embodiment search method is several orders of magnitude less than typical search algorithms for integer programming problems.

Next we test the performance when the frame coefficients undergo a linear operation. In particular, we multiply the frame coefficients in R^(M) by an M×M matrix with uniformly distributed entries. We use the optimization model described in section 6. The results are illustrated and compared with the rounding solution in FIG. 2B which has similar behavior to the first experiment.

Note that the reconstruction MSE of the rounding solution does not improve with increased redundancy after a linear operator. This is primarily because a general linear operator moves kernel components of the quantization error to the domain of the synthesis operator. This situation is avoided in the formulation of the optimization problem in section 6 to get the consistent improvement. 

What is claimed is:
 1. A quantization method for enhancing an image, comprising: (a) providing a vector c of frame expansion coefficients c_(j) for an input signal, where the c_(j) are expressed as multiples of a quantization step size; (b) computing a vector of truncation errors e where for each coefficient e_(j) =c_(j)−floor(c_(j)), where floor(c_(j)) denotes the integer part of c_(j); (c) computing e_(R) =B(e), where B is a projection onto a subspace of vectors of possible expansion coefficients; (d) initializing a binary vector α as the 0 vector and provide M clusters of vectors in said subspace where said clusters are of correlated vectors and a j-th cluster has representative vector h_(j) and binary increment vector μ_(j); (d) for j =1, 2, . . . , M: when ∥e_(R)±h_(j)∥<∥e_(R)∥, updating e_(R) to e_(R)±h_(j) and update α to α±μ_(j); (e) repeating step (d) until there are no more updates; and (f) outputting for each coefficient c_(j) the sum floor(c_(j)) +α_(j); (g) utilizing the outputted sum floor(c_(j)) +α_(j) for each coefficient c_(j) in quantization, wherein said quantization produces an enhanced image.
 2. The method of claim 1, wherein the method is utilized for oversampled signals with an error minimization searches. 